Data Cube Indexing of Large-Scale Infosec Repositories

نویسندگان

  • Alfonso Valdes
  • Martin Fong
  • Keith Skinner
چکیده

Analysts examining large-scale information security repositories for propagating network events are interested in quickly identifying temporal and spatial (IP address and/or port) regions containing interesting phenomena, or correlating events from different time periods. The size of these datasets strains current query capabilities provided by, for example, relational databases. We introduce a scalable, animated data cube representation and viewer, suitable for a broad range of observables, to permit coarsegrain detection and correlation in such data sets. We scale from the LAN to the Internet through flexible, locality-preserving hash algorithms mapping traffic source and destination (IP addresses or IP and port considered simultaneously). Data streams considered include inherently suspicious traffic such as packets rejected at a firewall, IDS alerts, or traffic to unused address space, as well as Netflow data. We display observables as intensity plots, where X and Y coordinates are the hashed source and target address and the intensity is proportional to traffic volume. Source and target address space may or may not be the same and may or may not be mapped the same way. Propagating events have distinct visual signatures that can be enhanced through matched filtering techniques. Future work will correlate cubes efficiently through cell-by-cell multiplication. An analyst will be able to, for example, examine whether plots representing two time periods (hours or days) exhibit similar patterns. Multiplication of a cube with its transpose permits identification of nodes that respond to potentially malicious probes. These data cubes permit coarse-grained detection and correlation without expensive data base queries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Numerical Study of Reynolds Number Effects on Flow over a Wall-Mounted Cube in a Channel Using LES

Turbulent flow over wall-mounted cube in a channel was investigated numerically using Large Eddy Simulation. The Selective Structure Function model was used to determine eddy viscosity that appeared in the subgrid scale stress terms in momentum equations. Studies were carried out for the flows with Reynolds number ranging from 1000 to 40000. To evaluate the computational results, data was compa...

متن کامل

Effective and efficient similarity search in scientific workflow repositories

Scientific workflows have become a valuable tool for large-scale data processing and analysis. This has led to the creation of specialized online repositories to facilitate workflow sharing and reuse. Over time, these repositories have grown to sizes that call for advanced methods to support workflow discovery, in particular for similarity search. Effective similarity search requires both high ...

متن کامل

Towards vocabulary-independent speech indexing for large-scale repositories

The Out-Of-Vocabulary problem remains a challenge for word-lattice-based speech indexing. Sub-word-based approaches address this problem effectively for small-scale tasks, but suffer from poor precisions on large-scale databases due to lack of strong language model constraints. We propose a method for searching OOV queries with large-scale databases in two steps. First, result candidates are ex...

متن کامل

Proceedings of the 2 nd International Workshop on Exploiting Large Knowledge Repositories and the 1 st International Workshop on Automatic Text

Knowledge based applications require linguistic, terminological and ontological resources. These applications are used to fulfill a set of tasks such as semantic indexing, knowledge extraction from text, information retrieval, etc. Using these resources and combining them for the same application is a tedious task with different levels of complexity. This requires their representation in a comm...

متن کامل

Indexing Millions of Packets per Second using GPUs

Network traffic loggers are devices that record a recent window of the entire traffic in one or more network links. The traffic is stored in packet repositories that enable retrospective analyses, e.g., for forensic investigation. Traffic loggers deployed over very high-speed networks must process and store millions of packets per second using commodity hardware. To enable interactive explorati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006